Properties of palindromes in finite words 

Mira-Cristiana ANISIU* Valeriu ANISIU^ Zoltan KASA^ 



Abstract 

We present a method which displays all palindromes of a given length 
from De Bruijn words of a certain order, and also a recursive one which 
constructs all palindromes of length n + 1 from the set of palindromes of 
length n. We show that the palindrome complexity function, which counts 
the number of palindromes of each length contained in a given word, has 
a different shape compared with the usual (subword) complexity function. 
We give upper bounds for the average number of palindromes contained 
in all words of length n, and obtain exact formulae for the number of 
palindromes of length 1 and 2 contained in all words of length n. 
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1 Introduction 

The palindrome complexity of infinite words has been studied by several authors 
(see [T], [3], [13] and the references therein). Similar problems related to the 
number of palindromes are important for finite words too. One of the reasons is 
that palindromes occur in DNA sequences (over 4 letters) as well as in protein 
description (over 20 letters), and their role is under research ([9]). 

Let an alphabet A with card(^) = g > 1 be given. The set of the words over 
A will be denoted by A* , and the set of words of length n by A". 

Given a word w — wiW2---Wn, the reversed oi w is w = Wn---W2Wi- Denoting 
by e the empty word, we put by convention e = e. The word w is a palindrome 
\i w = w. We denote by of' the word The set of the subwords of a 

k times 

word w which are nonempty palindromes will be denoted by PAL(w). The 
(infinite) set of all palindromes over the alphabet A is denoted by PAL(^), 
while PAL„(A) = Vkh{A) n A". 
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2 Storing and generating palindromes 

An old problem asks if, given an alphabet A with card(yl) = q, there exists a 
shortest word of length q'^ + k — 1 containing all the q'^ words of length k. The 
answer is affirmative and was given in [6], [10], [4]. For each k G N, these words 
are called De Bruijn words of order k. This property can be proved by means 
of the Eulerian cycles in the De Bruijn graph Bk~i- If a window of length k is 
moved along a De Bruijn word, at each step a different word is seen, all the g*^ 
words being displayed. 

We ask if it is possible to arrange all palindromes of length /c in a similar 
way. The answer is in general no, excepting the case of the two palindromes 
aba... a and bab...b of odd length. 

Proposition 1 Given a word w G A" and k >2, the following statements are 
equivalent: 

(1) all the subwords of length k are palindromes; 

(2) n is even, k = n—1 and there exists a,b € A, a ^ b so that w ~ (afe)"/^. 
Furthemore, in this case the only palindromes of w are (a6)"/^^^a and 

{ba)"/^~^b. 

Proof. Let us consider the first two palindromes aia2...ak and 5i&2---&fc such 
that 0202. ..afc = bib2...bk~i, hence 

o-k-i+i — a,i = = bk-i+2; i = 2, k. 

It follows 
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If A: = 21, [I > 1) we have &2 = Qi = 03 = ■■• = flfc-i and 63 = 02 = ... = o^ and 
Oi02...afe is a palindrome if and only if 01=02 = ... = Ofc, hence 0102. ..o^ = o*^; 
it follows that 6162. ..fofe = o*^ too, and the two palindromes are equal. 
If fc = 2Z + 1, we have 62 = oi = 03 = ... = Uk and 63 = 02 = ... = Ofe_i_ hence 
ai02...afe = abab...a (o ^ b) and bib2...bk = bab...b. If another palindrome will 
follow, it must be again (ab)"^/^ (equal with the first one). □ 

Remark 1 For k = 1, the maximum length of a word containing all distinct 
palindromes of length 1 (i.e. letters) exactly once is n = q. 

It is obvious that for A; > 2 it is not possible to arrange all palindromes of 
length k in the most compact way. But each palindrome is determined by the 
parity of its length and its first [fc/2] letters, where [•] denotes the ceil function 
(which return the smallest integer that is greater than or equal to a specified 
number). 
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Proposition 2 All palindromes of length k can be obtained from a De Bruijn 
word of length 17^^/21 + \k/2\-l. 

Proof. The De Bruijn word contains all different words of length [fc/2] . Each 
such word ai...a\^k/2\ can be extended to a palindrome by symmetry, for k even, 
and by taking a^k/2\+i = a,^k/2\-i, ■■■,ak = oi, for k odd. □ 

Example 1 Let k = 3, q = 3 and the De Bruijn word of order \k/2~\ — 2 
wi = 0221201100. From each word of length 2 which appears in the given De 
Bruijn word, we obtain the corresponding palindrome of length k ^ 3: 



02 - 


> 020 


22 - 


> 222 


21 - 


^ 212 


12 - 


^ 121 


20 - 


^ 202 


01 - 


^ 010 


11 - 


^ 111 


10 - 


> 101 


00 - 


> 000. 



Let A; = 4, g = 2 and the De Bruijn word of order [fc/2] = 2 W2 = 01100. 
From each word of length 2 contained in 01100 we obtain by symmetry the 
corresponding palindrome of length k = A: 

01 -+ 0110 
11 -+ 1111 
10 ^ 1001 
00 0000. 

There are several algorithms which construct De Bruijn words, for example, 
in [IS], [IH], [7] and 0. 

We can generate recursively all palindromes of length n, n G N, using the 
difference representation. This is based on the following proposition. 

Proposition 3 Lfwi,W2, Wp are all binary (A — {0,1}) palindromes of 

length n, where p — 2^^~\ , n > 1, then 

2wi,2w2, ... ,2wp,2"+i + l + wi,2"+i + l + u;2, 2"+^ + 1 + 
are all palindromes of length n + 2. 

Proof. If It; is a binary palindrome of length n, then OwO and Iwl will be 
palindromes too, and the only palindromes of length n + 2 which contains w as 
a subword, which proves the proposition. □ 
In order to generate all binary palindromes of a given length let us begin 
with an example considering all binary palindromes of length 3 and 4 and their 
decimal representation: 
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The sequence of palindromes in increasing order based on their decimal value 
for a given length can be represented by their differences. The difference repre- 
sentation of the sequence 0, 2, 5, 7 is 2, 3, 2 (2-0 = 2, 5-2 = 3, 7-5 = 2), and 
the difference representation of the sequence 0, 6, 9, 15 is 6, 3, 6. A difference 
representation is always a simmctric sequence and the corresponding sequence 
of palindromc^s in decimal can be obtained by successive addition beginning with 
0: + 6 = 6, 6 + 3 = 9, 9 + 6 = 15. By direct computation we obtain the 
following difference representation of palindromes for length n < 8. 
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We easily can generalize and prove by induction that the difference repre- 
sentations can be obtained as follows. 

For n = 2k we have the difference representation: 

Oi, a2, . . . , a2'=-i7 

from which the difference representation for 2A: + 1 is: 

2'^, ai, 2'^, a2, 2*^, ...,2^, a2i=_i,2'^. 

For n = 2fc + 1 we have the difference representation: 

2*^, ai, 2'^, a2, 2'^, ...,2'^, a2'=-i)2'^, 

from which the difference representation for 2k + 2 is: 

3 • 2*^, ai, 3 • 2'=, 02, 3 • 2^=, . . . , 3 • 2^, 02^-1, 3 • 2*^. 

This representation can be generalized for q> 2. The number of palindromes 
in this case is gl" ^'l . 

For n = 2k we have the difference representation: 

ai, a2, ■ ■ . , cigfe-i, 
from which the difference representation for 2fc + 1 is: 

q ,...,q , ai, q , 02, q ,...,(? , q ,---,q , a„fe_i, q ,...,q . 

V ' V ' V ' V ' ^ V ' 

q—1 times g— 1 times g— 1 times q— 1 times q— 1 times 
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For 71 = 2A; + 1 we have the difference representation: 

Q— 1 times 9—1 times 9—1 times 

from which the difference representation for 2k + 2 is: 

(g + ...,iq + l)g^ ai, (<z + . . . , (g + l)g^ aa, 

V ■' V ' 

q—1 times 9—1 times 

(g+l)g^ (g+l)g^ a,._i, (g + l)g^ . . . , (g + l)g'= . 

V ' ^ V ' 

q—1 times times 

3 The shape of the paUndrome complexity func- 
tions 

For an infinite sequence C/, the (suhword) complexity function pu : N — > N 
(defined in |17| as the block growth, then named suhword complexity in |5j) is 
given by pu{n) = card(i^(f7) n A") for n g N, where F{U) is the set of ah 
finite subwords (factors) of U . Therefore the complexity function maps each 
nonnegative number n to the number of subwords of length n of C/; it verifies 
the iterative equation 

9 

pu{n+l) =Pu(n) + ~ l)s(i,n), (1) 

s{j, n) being the cardinal of the set of the subwords in U having the length n 
and the right valence j. A subword u G U has the right valence j if there are j 
and only j distinct letters Xi such that uxi e F{U), 1 <i<j. 

For a finite word w of length n, the complexity function p^ : N — > N given 
by Pw{k) = card(F(ui) n A''), fc € N, has the property that Pw{k) = for > n. 
The corresponding iterative equation is 

9 

p^(/c + l) -p^(fc) + ^(j-l)s(i,/c)-so(A:), (2) 

where so{k) ~ s(0, k) E {0, 1} stands for the cardinal of the set of subwords v 
(suffixes of w of length fc) which cannot be continued as vx G F{w), x E A. We 
can write ([2]) in a condensed form 

9 

p,,(/c + l) -p^(fc) + ^(j-l)s(j,/c). (3) 

The above relations have their correspondents in terms of left extensions of 
the subwords. 
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For an infinite sequence U, the complexity function pu is nondecreasing; 
more than that, if there exists m S N such that puim + 1) = pu{m), then pu is 
constant for n > m. 

The complexity function for a finite word w of length n has a different 
behaviour, because of Pw(n) = 1 (there is a unique subword of length n, namely 
w). It was proved ([H], [13], [IS], [2) that the shape of the complexity function 
is trapezoidal. 

Theorem 1 Given a finite word w of length n, there are three intervals of 
monotonicity for pw : [0, J], [ J, Af] and [M, n]; the function increases at first, is 
constant and then decreases with the slope —1. 

The palindrome complexity function of a finite or infinite word w is given by 
pal^ : N — ^ N, pal„(fc) = card(PAL(w) n A^), fc e N. Obviously, 

pal„(A:) <p„(fc), fceN, (4) 

and for finite words of length |w| = rt, 

^al^{k) <n\m\q^'^''^'^,n-k + lY fce{0, (5) 

The palindrome u e PAL(u') has the palindrome valence j if there are j and 
only j distinct letters Xi such that XiUXi e PAL{w), 1 < i < j- We denote by 

■SpO, k) — card {u £ (PAL{w) n A'') : u has the palindrome valence j} , (6) 

and by Sp(0, k) the cardinal of the set of subwords v G PAL(u') n A'^ (not nec- 
essarily suffixes or prefixes of w) which cannot be continued as xvx G PAL(w), 
X e A. 

The palindrome complexity function of finite or infinite words satisfies the 
iterative equation 

1 

pal„(fc + 2) = pal^(fc) + Y,ij - ^KU: k). (7) 

j=o 

Due to the fact that the number of even palindromes is not directly related to 
that of odd ones, we do not expect that pal.^, is of trapezoidal shape, as it was 
the case for the subword complexity function p^ . 

For this reason we define the odd, respectively even palindrome complexity 
function as the restrictions of pal^ to odd, respectively even integers: palj^ : 
2N + 1 ^ N, pal° (fc) = pal^(fc); pal^ : 2N ^ N, pai;(fc) = pal^(fc). 

These functions have a trapezoidal form for short words; nevertheless, this 
is not true in general, as the following examples show. 

Example 2 The word Wi ^ lOlO^l^O^lO with \wi\ = 19 has pal°^(l) = 2, 
pal°^(3) = 3, paC^(5) = 1, pal°^(7) = 2, pal°^(9) = 1. (see Fig. 1.) 
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• Wi = lOlOM^O^lO 



1 2 3 4 5 G 7 s 9 10 n 



Figure 1: Odd and even palindrome complexity function 

Example 3 The word W2 = 1^0^ 10^ 1^0 with \w2\ 22 has pal^j2) = 2, 
pal^^(4) = 3, paC^(6) = 1, pal^^jS) = 2, pai;^(10) = 1. (see Fig. 1.) 

Remark 2 The palindrome complexity for infinite words is not nondecreasing, 
as the usual complexity function is. Indeed, we can continue the word in Example 
^with 11001100..., and its odd palindrome complexity function will be as that for 
Wi, and then equal to for fc > 11. Similarly, we can continue W2 in Example\^ 
with 1010... to obtain an infinite word with the even palindrome complexity of 
W2 till A; = 10 and equal to for k > 12. 

4 Average number of palindromes 

We consider an alphabet A with q > 2 letters. 

Definition 1 We define the total palindrome complexity P by 

\w\ 

P{w) = ^pal„(n), (8) 

n=l 

where w is a word of length \w\, and pal^(n,) denotes the number of distinct 
palindromes of length n which are nonempty subwords ofw. 

Because he set of the nonempty palindromes in w is denoted by PAL(w), we 
can write also P{w) = card(PAL(w)). 

Definition 2 The average number of palindromes Mq{n) contained in all 
words of length n is defined by 

M,{n) = -^^^^ . (9) 
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We can give the following upper estimate for Mq{n). 

Theorem 2 For n G N, the average number of palindromes contained in the 
words of length n satisfies the inequalities 

g-(»-i)/2(g + 3) + 2n{q -l) + q^-2q^-2q-l 
Mq{n) < (<z^^I)2 ' 



g-"/^(3g + 1) + 2n{q - 1) + g3 - 2g2 _ 2g - 1 

M„[n) < -7, , for n even. 

(g- 1)2 



(10) 



Proof. We have 

E^H - E E 1- E E E 1 

ujeA" iuGA" 7rePAL(u.) w(EA" k=l 7rePAL{w)nA'' 



< 



E E 1+E E El' 

meA" 7rGPAL(u))nAi fe=2 7rePALfc(A) liiGA" 

7rePAL(-!i))nA'' 



and 

E E = 9"+'- (11) 

For a fixed palindrome tt, with |7r| = fc, the number of the words of length n in 
which it appears as a subword at position i (1 < i < n — fc + 1) is g"^''". But the 
position i is arbitrary, so that there are at most (n — k + l)q^~'' words in which 
TT is a subword, these words being not necessarily distinct. It follows that 

n 

w&A" k=2 7rePALfc(A) 

The number of the palindromes of length k is (7^^/21 ^ therefore 

n 

PH < 9"+' + E('^ ^'' + i)(z"-'=+rfc/2i 

toSA" k=2 

and 

n 
k=2 

We split the sum according to fc = 2j, j 1, [ri/2j , respectively fc = 2j + 1, 
j — 1, \ (n — 1) /2J , and obtain 

L«/2J L("-i)/2j 
M,(n) < g + ^ (n - 2j + 1)(7--'' + E ('^ - 2j)g-^. 

S S 

Making use of E'?"'' = (1 " " 1) and ^jq^' = - ^^"'(s + 1) + 

■5<Z ")/('? ^ 1)^1 it follows that Mq(n) satisfies the inequalities in (10). □ 
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Corollary 1 The following inequality holds 



MJn) 2 
limsup ' < . (12) 



Proof. 



Mgin) r M,(2n+1) M,(2n) 
iim sup — = max < nm sup — , lim sup — 

n-^oo ^ n^oo 272 -|- 1 n — ^oo 2/7- 

+ 3) + 2 (2n + 1) (9 - 1) + g3 _ 2g2 - 2g - 1\ 1 



< max s lim 
lim 



n— f C30 



(g- 1)2 y 2n + 1^ 

q-"{3q + l)+4n{q-l)+q^ -2q^ -2q-l\ 1 



(g- 1)2 y 2nJ 9 - 1 



□ 

We are interested in finding how large is the average number of palindromes 
contained in the words of length n compared to the length n. The numerical 
estimations done for small values of n show that Mq(n) is comparable to n, but 
Corollary [T] allows us to show that for g > 4 this does not hold. 



Corollary 2 For an alphabet with g > 4 letters, 

li„,sup^<l. (13) 



In the proof of Theorem [2] we have used the rough inequality (11), which 
was sufficient to prove the result. In fact, it is not difficult to calculate exactly 

Sn..p^ lforp=l,2. (14) 

This result has intrinsic importance. 

Theorem 3 The number of occurrences of the palindromes of length I, respec- 
tively 2, in all words of length n (counted once if a palindrome appears in a 
word, and once again if it appears in another one) is given by 

Sn,l^q"+'^q{q-ir, (15) 

respectively by 

n+2 



Jn,2 — q 



l + ^q^+q-3 



{q-l)^q^ + q-3 



-l-^q^ + q-3 



n+2\ 



(16) 
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Proof. We use Iverson's convention [TT] 

1, if a is true 
0, if a is false 

and obtain 

Sn,i = ^ ^[a-inw]^q^ [ai in w] , 
where oi is a fixed letter of the alphabet A. Then 

Sn,i = 9 X! ["1 "^1 = I 9" ^ X! in J = - q{q- I)" . 

w&A" \ w&A" / 

We proceed similarly to calculate Sn,2 — ^ ^ 1 and obtain 

weA" 7rePAL(ui)nA2 

Sn,2 = ^ ^ [aa in w] = g ^ [aiai in w] , 

weA" aeA wGA" 

where oi is again a fixed letter of the alphabet A. We denote (p{n) := [aioi in w], 

weA" 

for which ip(2) ~ 1 and 99(3) = 2g — 1. It is easier to establish a recurrence for- 
mula for -!/;(«) — q" — (p(n) = [aifti not in w]. The number ^(n) is obtained 

w£A" 

from: 

- the number [q — l)ijj(n — 1) of words which do not end in ai and have not 
OiOi in their first n — 1 positions; 

- the number (g— l)'0(n — 2) of words which end in ai, have the n — 1 position 
occupied by one of the other q — I letters and have not aiai in the first n — 2 
positions. 

It follows that "0 satisfies the recurrence formula 

i;{n) = iq-l){^j{n-l)+ij{n-2)), (17) 
with V'(2) — q^ ~ 1 and V(3) — q'^ — 2q + 1. Its solution is 

(9-1)^9'+ 9-3 2 y 

/ ; \ n+2\ 

q-l-Vq^+Q-A 



and (16 1 follows from the fact that 

Sn,2 - g (g" - V^(n)) . (18) 



The expression of Sn,2 from (16 1 allows us to improve Corollary [T] 



□ 
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Corollary 3 The following inequality holds 



MJn) q + 1 
limsup ^ < -. (19) 

ri^oo n q(q-l) 



Proof. Taking into account the inequality 



and (18), we get 



/ n 

Mq{n) <—lSnS+Sn,2+J2 {n-k+l)q 
^ \ fc=3 7rePAL(yl)nAfc 



n — k 



^ ^-fe+L(fe+i)/2j 

k=3. 



But < (g-l + Vg^+g-s) /2 < q and -1 < {q - I- ^ q^ + q - ^ /2 < 
for q>2, hence hm il){n)/q^''' — 0. Then 



hmsup^^i^ < lim iy(n-fc + l)g-'=+L(fe+i)/2J <y q-k+V{k+i)/2\ 

□ 



Corollary 4 T/ie inequality (13) holds for q ^ i too. 



It seems that (13 1 holds also for q = 2. Using a computer program we 
obtained some values for the terms of the sequence M*{n) — M2{n)/n, n > 2. 
The first values are: M*{n) = 1, n = 2, . . . , 7; M*(8) = 0.99750; M*{9) = 
0.98550, which were close to 1. We tried for greater values of n and get 

M*(20) = 0.89975, M*(21) = 0.89002, M*(22) = 0.88043 

M*(23) = 0.87101, M*(24) = 0.86177, . . . , A/* (30) = 0.81064. 

The last value was obtained in a very long time, so for greater values of n we 
generated some random words wi, W2t--, wg of length 100, respectively 200, 
300, 400 and 500 over A — {0, 1} and get some roughly approximate values 
M*{n) ~ (pal^^(n) + ... + pal„^(r7.)) /£. For £ = 200 we obtained 

M* (100) ~ 0.53, M* (200) ~ 0.39, M* (300) - 0.32, 
M* (400) ~ 0.29, M* (500) - 0.26. 

This method allows us to obtain the previous exactly computed values Af*(20), 
M*(30) with two exact digits. These numerical results allow us to formulate 
the following 

Conjecture The sequence Mq {n) /n is strictly decreasing for n > 7. 
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